Skip to content

Improve take performance on List arrays#9643

Merged
alamb merged 2 commits intoapache:mainfrom
AdamGS:adamg/list-take-perf-improvement
Apr 16, 2026
Merged

Improve take performance on List arrays#9643
alamb merged 2 commits intoapache:mainfrom
AdamGS:adamg/list-take-perf-improvement

Conversation

@AdamGS
Copy link
Copy Markdown
Contributor

@AdamGS AdamGS commented Apr 1, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

This PR builds on top of #9626, improving the results on those benchmarks.

What changes are included in this PR?

  1. Similar to Improve take_bytes perf in the null cases between 10-25% #9625, branch the function into the null and non-null paths
  2. Copy the list elements in a single pass while building the offsets, allocating less intermediate state.

Are these changes tested?

Added a few tests for sliced list arrays.

Are there any user-facing changes?

No

@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented Apr 1, 2026

Results on the benchmarks in #9626:

take list i32 512       time:   [4.4872 µs 4.5048 µs 4.5246 µs]
                        change: [−12.029% −11.670% −11.245%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking take list i32 1024: Collecting 100 samples in estimated 5.0193 s (571k iterattake list i32 1024      time:   [8.1540 µs 8.1715 µs 8.1891 µs]
                        change: [−24.814% −22.002% −19.215%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking take list i32 null values 1024: Collecting 100 samples in estimated 5.0033 s take list i32 null values 1024
                        time:   [5.5799 µs 5.6028 µs 5.6273 µs]
                        change: [−11.178% −4.1193% +8.6975%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

Benchmarking take list i32 null indices 1024: Collecting 100 samples in estimated 5.0048 stake list i32 null indices 1024
                        time:   [7.9070 µs 7.9327 µs 7.9632 µs]
                        change: [−80.594% −80.504% −80.409%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking take list i32 null values null indices 1024: Collecting 100 samples in estimatake list i32 null values null indices 1024
                        time:   [5.3172 µs 5.3387 µs 5.3660 µs]
                        change: [−14.330% −13.956% −13.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 1, 2026
@AdamGS AdamGS force-pushed the adamg/list-take-perf-improvement branch from 8b7e3be to c66bad6 Compare April 8, 2026 11:48
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/list-take-perf-improvement branch from c66bad6 to 95d54ac Compare April 8, 2026 11:53
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

run benchmark take_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4259726053-1360-r46bp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adamg/list-take-perf-improvement (95d54ac) to aac969d (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
take bool 1024                                                            1.00   1031.9±1.27ns        ? ?/sec    1.00   1032.5±0.71ns        ? ?/sec
take bool 512                                                             1.00    573.0±2.65ns        ? ?/sec    1.00    572.6±0.47ns        ? ?/sec
take bool null indices 1024                                               1.10    851.0±8.27ns        ? ?/sec    1.00   770.8±28.21ns        ? ?/sec
take bool null values 1024                                                1.02      2.1±0.03µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.10  1614.0±14.89ns        ? ?/sec    1.00  1461.9±14.53ns        ? ?/sec
take check bounds i32 1024                                                1.00    657.6±2.35ns        ? ?/sec    1.01    665.6±0.95ns        ? ?/sec
take check bounds i32 512                                                 1.17    457.1±4.85ns        ? ?/sec    1.00    389.5±0.65ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.00      2.7±0.16µs        ? ?/sec    1.00      2.7±0.16µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.7±0.16µs        ? ?/sec    1.00      3.7±0.16µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.06    730.3±2.17ns        ? ?/sec    1.00    692.1±4.55ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.02   1786.7±2.58ns        ? ?/sec    1.00   1755.1±3.23ns        ? ?/sec
take i32 1024                                                             1.00    514.0±0.92ns        ? ?/sec    1.02    525.3±0.39ns        ? ?/sec
take i32 512                                                              1.01    355.5±4.15ns        ? ?/sec    1.00    351.7±0.95ns        ? ?/sec
take i32 null indices 1024                                                1.02    872.2±1.27ns        ? ?/sec    1.00    857.3±0.95ns        ? ?/sec
take i32 null values 1024                                                 1.00   1544.9±4.51ns        ? ?/sec    1.01   1564.8±2.53ns        ? ?/sec
take i32 null values null indices 1024                                    1.02   1719.0±4.28ns        ? ?/sec    1.00   1678.0±1.84ns        ? ?/sec
take list i32 1024                                                        1.00      7.8±0.04µs        ? ?/sec    1.65     12.9±0.23µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.03µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.4±0.05µs        ? ?/sec    6.40     60.0±0.26µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.01µs        ? ?/sec    1.35      7.7±0.05µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.06µs        ? ?/sec    1.17      8.0±0.02µs        ? ?/sec
take listview i32 1024                                                    1.00   1333.9±2.40ns        ? ?/sec    1.07   1433.6±4.40ns        ? ?/sec
take listview i32 512                                                     1.00    949.4±2.23ns        ? ?/sec    1.03    977.4±2.34ns        ? ?/sec
take listview i32 null indices 1024                                       1.00   1913.2±3.06ns        ? ?/sec    1.05      2.0±0.00µs        ? ?/sec
take listview i32 null values 1024                                        1.01      2.3±0.00µs        ? ?/sec    1.00      2.3±0.00µs        ? ?/sec
take listview i32 null values null indices 1024                           1.06      2.9±0.00µs        ? ?/sec    1.00      2.7±0.00µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.07µs        ? ?/sec    1.02     17.2±0.07µs        ? ?/sec
take str 1024                                                             1.00      8.4±0.05µs        ? ?/sec    1.00      8.4±0.04µs        ? ?/sec
take str 512                                                              1.02      4.0±0.02µs        ? ?/sec    1.00      3.9±0.02µs        ? ?/sec
take str null indices 1024                                                1.01      5.8±0.02µs        ? ?/sec    1.00      5.7±0.02µs        ? ?/sec
take str null indices 512                                                 1.00      2.7±0.01µs        ? ?/sec    1.00      2.7±0.02µs        ? ?/sec
take str null values 1024                                                 1.00      6.4±0.03µs        ? ?/sec    1.00      6.4±0.02µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.3±0.01µs        ? ?/sec    1.06      5.6±0.01µs        ? ?/sec
take stringview 1024                                                      1.00    909.1±2.66ns        ? ?/sec    1.08    982.6±2.99ns        ? ?/sec
take stringview 512                                                       1.00    557.3±1.51ns        ? ?/sec    1.00    557.6±1.30ns        ? ?/sec
take stringview null indices 1024                                         1.00    914.2±5.87ns        ? ?/sec    1.00    916.1±2.74ns        ? ?/sec
take stringview null indices 512                                          1.01    565.0±1.85ns        ? ?/sec    1.00    560.7±1.12ns        ? ?/sec
take stringview null values 1024                                          1.00   1895.0±1.62ns        ? ?/sec    1.01   1907.8±5.24ns        ? ?/sec
take stringview null values null indices 1024                             1.04   1753.1±1.78ns        ? ?/sec    1.00   1685.2±3.03ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 374.3s
Peak memory 1.7 GiB
Avg memory 1.7 GiB
CPU user 373.3s
CPU sys 0.8s
Peak spill 0 B

branch

Metric Value
Wall time 371.4s
Peak memory 1.7 GiB
Avg memory 1.7 GiB
CPU user 371.1s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
...
take list i32 1024                                                        1.00      7.8±0.04µs        ? ?/sec    1.65     12.9±0.23µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.03µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.4±0.05µs        ? ?/sec    6.40     60.0±0.26µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.01µs        ? ?/sec    1.35      7.7±0.05µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.06µs        ? ?/sec    1.17      8.0±0.02µs        ? ?/sec
...

🚀

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @AdamGS -- I went through it carefully. I have some ideas on potentially ways to make ti faster still but we can perhaps do that as a follow on PR

Comment thread arrow-select/src/take.rs Outdated
}
}
Some(output_nulls) => {
new_offsets.resize(indices.len() + 1, OffsetType::Native::zero());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why initialize the offsets to zero, when they are immediately overwritten?

You could probably use push and extend instead of fill and setting offsets directly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that this would be a nice way to avoid unsafe while keeping performance, but locally perf seems to be as good

Comment thread arrow-select/src/take.rs
Comment thread arrow-select/src/take.rs
)
}

#[test]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make it easier to see what you have changed if you didn't also move the tests around

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea why I did that

Comment thread arrow-select/src/take.rs
@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented Apr 16, 2026

I'll address all the comments later today, should be ready by tomorrow

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

run benchmark take_kernels

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4261494662-1379-xwz99 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adamg/list-take-perf-improvement (57df7c6) to aac969d (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
take bool 1024                                                            1.00   1033.9±0.59ns        ? ?/sec    1.01   1039.3±1.30ns        ? ?/sec
take bool 512                                                             1.00    571.3±0.73ns        ? ?/sec    1.00    571.3±1.09ns        ? ?/sec
take bool null indices 1024                                               1.09    847.8±8.22ns        ? ?/sec    1.00   780.2±29.70ns        ? ?/sec
take bool null values 1024                                                1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.11  1614.7±12.37ns        ? ?/sec    1.00   1452.9±9.55ns        ? ?/sec
take check bounds i32 1024                                                1.00    657.8±2.14ns        ? ?/sec    1.01    666.3±1.04ns        ? ?/sec
take check bounds i32 512                                                 1.16    453.3±0.78ns        ? ?/sec    1.00    390.0±0.82ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.00      2.7±0.19µs        ? ?/sec    1.00      2.7±0.19µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.8±0.19µs        ? ?/sec    1.00      3.8±0.19µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.00    688.1±1.40ns        ? ?/sec    1.01    697.5±1.28ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.00   1750.6±1.28ns        ? ?/sec    1.00   1752.2±3.15ns        ? ?/sec
take i32 1024                                                             1.00    516.1±1.68ns        ? ?/sec    1.01    522.5±3.25ns        ? ?/sec
take i32 512                                                              1.00    353.0±0.66ns        ? ?/sec    1.00    354.7±3.29ns        ? ?/sec
take i32 null indices 1024                                                1.02    876.0±3.36ns        ? ?/sec    1.00    858.5±2.00ns        ? ?/sec
take i32 null values 1024                                                 1.11  1715.1±29.14ns        ? ?/sec    1.00   1548.9±6.78ns        ? ?/sec
take i32 null values null indices 1024                                    1.03   1714.7±1.60ns        ? ?/sec    1.00   1664.7±3.52ns        ? ?/sec
take list i32 1024                                                        1.00      7.8±0.03µs        ? ?/sec    1.66     12.9±0.03µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.01µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.7±0.10µs        ? ?/sec    6.28     60.6±0.48µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.02µs        ? ?/sec    1.34      7.7±0.01µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.05µs        ? ?/sec    1.18      8.0±0.01µs        ? ?/sec
take listview i32 1024                                                    1.02   1334.2±2.22ns        ? ?/sec    1.00   1310.8±7.29ns        ? ?/sec
take listview i32 512                                                     1.00    952.1±2.04ns        ? ?/sec    1.14   1084.0±2.29ns        ? ?/sec
take listview i32 null indices 1024                                       1.00   1972.9±3.14ns        ? ?/sec    1.01   1995.7±4.14ns        ? ?/sec
take listview i32 null values 1024                                        1.00      2.3±0.00µs        ? ?/sec    1.03      2.4±0.00µs        ? ?/sec
take listview i32 null values null indices 1024                           1.00      2.8±0.00µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.07µs        ? ?/sec    1.02     17.3±0.07µs        ? ?/sec
take str 1024                                                             1.02      8.5±0.04µs        ? ?/sec    1.00      8.4±0.04µs        ? ?/sec
take str 512                                                              1.02      4.0±0.02µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
take str null indices 1024                                                1.00      5.7±0.02µs        ? ?/sec    1.00      5.7±0.02µs        ? ?/sec
take str null indices 512                                                 1.05      2.9±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
take str null values 1024                                                 1.01      6.4±0.03µs        ? ?/sec    1.00      6.4±0.02µs        ? ?/sec
take str null values null indices 1024                                    1.03      5.3±0.01µs        ? ?/sec    1.00      5.2±0.01µs        ? ?/sec
take stringview 1024                                                      1.00    753.4±4.08ns        ? ?/sec    1.31    983.4±1.51ns        ? ?/sec
take stringview 512                                                       1.00    472.8±1.64ns        ? ?/sec    1.18    557.6±1.12ns        ? ?/sec
take stringview null indices 1024                                         1.00    917.0±1.36ns        ? ?/sec    1.00    919.2±1.35ns        ? ?/sec
take stringview null indices 512                                          1.00    561.2±1.12ns        ? ?/sec    1.00    563.7±1.17ns        ? ?/sec
take stringview null values 1024                                          1.00   1802.9±1.35ns        ? ?/sec    1.05   1901.7±3.00ns        ? ?/sec
take stringview null values null indices 1024                             1.03   1732.1±2.31ns        ? ?/sec    1.00   1676.1±5.41ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 371.3s
Peak memory 2.1 GiB
Avg memory 2.1 GiB
CPU user 370.4s
CPU sys 0.7s
Peak spill 0 B

branch

Metric Value
Wall time 376.5s
Peak memory 2.1 GiB
Avg memory 2.1 GiB
CPU user 376.3s
CPU sys 0.1s
Peak spill 0 B

File an issue against this benchmark runner

@AdamGS
Copy link
Copy Markdown
Contributor Author

AdamGS commented Apr 16, 2026

take list i32 1024                                                        1.00      7.8±0.03µs        ? ?/sec    1.66     12.9±0.03µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.01µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.7±0.10µs        ? ?/sec    6.28     60.6±0.48µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.02µs        ? ?/sec    1.34      7.7±0.01µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.05µs        ? ?/sec    1.18      8.0±0.01µs        ? ?/sec

🥳

@alamb alamb merged commit 89b1497 into apache:main Apr 16, 2026
26 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

Thanks @AdamGS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants